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Abstract 

We show that the structure of a growing tree preserves an information on the shape 
of an initial graph. For the exponential trees, evidence of this kind of memory is 
provided by means of the iterative equations, derived for the moments of the node- 
node distance distribution. Numerical calculations confirm the result and allow to 
extend the conclusion to the Barabasi-Albert scale-free trees. The memory effect 
almost disappears, if subsequent nodes are connected to the network with more 
than one link. 
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1 Introduction 



The problem of growing trees belongs to larger class of problems of evolving 
networks — a new area with many interdisciplinary applications, from biology 
and computational science to linguistics [1,2,3]. In statistical mechanics, we 
often investigate the state of thermodynamic equilibrium, which is unique 
and therefore it cannot preserve any information. However, in other sciences 
memory on past states is an essential ingredient of the system. Here we are 
interested in search how the structure of the origin of a tree, i.e. of a graph 
from which the tree is constructed, influences the overall characteristics of the 
growing system. 

A network containing N nodes is fully characterized by its connectivity matrix 
C: Cjv(«, j) = 1 if the nodes i,j are linked together, and cj^(i,j) = elsewhere. 
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More convenient but somewhat redundant is the distance matrix S, where the 
matrix element S]y(i,j) is the number of links along the shortest path from 
i to j. It is often simpler to describe a network statistically. A local char- 
acteristics of a network includes the degree distribution, i.e. the probability 
that a node is linked to a given number k of neighbors. A global character- 
istics includes the node-node distance distribution. Whereas the former can 
be treated as complete only conditionally [4], a few is known on the latter. 
Recent progress of knowledge on the mean node-node distance d = [(sjv(z, j))] 
is due to applications of equilibrium statistical mechanics, scaling hypotheses 
and/or assumptions of lack of correlations between nodes [5,6,7,8]. Here, (• • • ) 
denotes an average over N(N — 1) non-diagonal matrix elements and [• • •] is 
an average over different matrices, i.e. different graphs. 

By growing we mean adding subsequent nodes to an already existing graph. 
When each node is added with one link only (m = 1), a tree — a compact 
graph without loops and without multiple edges — is formed. In trees, a path 
between each two nodes is unique, and it cannot be changed during the growth 
process. When a node is added, the node-node distance matrix S is increased 
by one column and one row. Once the matrix elements are formed, they do 
not change their values. However, if nodes are added with two or more links 
(m > 1), a kind of shortcuts are formed and some node-node distances may 
be shortened. 

The main goal of this work is to demonstrate, that the node-node distance 
distribution of a growing tree preserves an information on the structure of the 
initial tree, from which it is formed. 

Below we deal with two kinds of growing trees, which differ in the degree dis- 
tribution. Let us consider the linking of new nodes to randomly selected nodes. 
When the selection is made without any preference, we obtain a so-called ex- 
ponential tree. In this case, the degree distribution P{k) = 2~ k , where k is the 
number of links of a node. Nodes can be selected also with some preference 
with respect to their degree. If the linking probability is proportional to the 
degree k, we obtain the scale-free or Barabasi-Albert networks [9]. In this 
case, P(k) oc k~ J , with 7 = 3 [1,2,3]. 

To achieve our goal, the simplest method is to calculate the mean node-node 
distance d(N) for trees of iV nodes, the formation of which has started from 
two different trees with four nodes. This is done in the next section with it- 
erative equations, which has been derived recently for the exponential trees 
[10]. In Section 3, the growth algorithms are introduced, basing on an evolu- 
tion of the distance matrix. In Section 4, numerical results are presented for 
the exponential trees and the Barabasi-Albert scale-free trees. We show also 
that the memory on the ancestral network is much reduced, if the trees are 
substituted by graphs with cyclic paths, i.e. with m > 1. The last section is 
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devoted to discussion. 



2 Weights of exponential trees 

Consider the probability that a tree of a given structure is grown. Trees are 
different if there is no one-to-one correspondence between their pairs of linked 
nodes [11]. Let us denote the number of different trees with N nodes by K(N). 
It is easy to check by inspection, that K(2) = K(3) = 1 and K(4) = 2. As 
K(3) = 1, the probability — or weight — of the tree of three nodes (Fig. 1(a)) 
must be one. An exponential tree of four nodes can be formed by linking a 
new (fourth) node either to one of two end nodes, or to the central one. Then, 
the probability of a chain of nodes (Fig. 1(b)) is 2/3, and the probability of a 
star-like-tree (Fig. 1(c)) is 1/3. From the chain, a longer chain (Fig. 1(d)) can 
be produced in two ways, then its weight is 2/3 • 2/4 = 1/3. From the star, 
another star (Fig. 1(f)) can appear with the probability 1/3 • 1/4 = 1/12. The 
remaining tree (Fig. 1(e)) can be formed from either the chain or the star, 
then its weight is 2/3 • 2/4 + 1/3 • 3/4 = 7/12. We note that in the case of 
the scale-free trees, the weights of the trees presented in Fig. 1 are: 1, 1/2, 
1/2, 1/6, 7/12 and 1/4, respectively. This is a simple demonstration, that the 
weights of trees in two different classes are different. 

Any possible tree can be formed from a tree of three nodes (Fig. 1(a)). The way 
to form chains and stars is unique and then, their weights are relatively small. 
Example giving, the weight of an exponential star of N nodes is 2/(N — 1)!. 
We could eliminate stars, if we develop trees from the chain shown in Fig. 1(b). 
Seemingly, the weights of other trees should not be changed much, but all of 
them are influenced by the lack of the stars. Example giving, in this case the 
tree shown in Fig. 1(e) can be formed in one unique way. As a consequence, the 
whole distribution of weights is rebuilt. With the iterative equations derived 
recently [10], we can calculate the mean distance d and the mean square of 
distances e = [(s 2 N (i, j))] for two "families" of trees. One is formed from the 
chain-like tree shown in Fig. 1(b) and labeled as "Z", and another — from 
the star-like tree presented in Fig. 1(c) and marked as "Y". Then, the first 
"family" does not contain stars, and the second one does not contain chains. 
The equations are: 
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Fig. 1. Examples of trees. The Z-like chain (b) and the Y-like star (c) are the 
ancestors of the two "families" of growing networks described in the text. 

The information on the initial trees is encoded in the initial values of d(A) and 
e(4). It is easy to check, that for the chain d z (4) = 5/3, e z (4) = 10/3 and for 
the star d Y (A) = 3/2, e Y (4) = 5/2. 

Similar method has been used in [12,13]. The difference is that here, the Eqs. 
(1) are exact, but they apply only to the exponential trees. 



3 Numerical algorithm 

Two initial trees with four nodes (the chain and the star) are represented in 
the computer memory as two distance matrices S(Z) and S(Y). The starting 
point are two matrices for two trees of four nodes: 



^0 1 2 3] 
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and S 4 (Y) = 
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v 2 1 2 



A 



for the chain and the star, respectively. 

Selecting a node to link a new node is equivalent to select a number q of 
column/row of the matrix. Then the matrix is supplemented by new column 
and row, which are copies of the q-th column/row but with all elements incre- 
mented by one 

V 1 < i < N : s N+1 (N + l,i) = s N+1 (i, N + 1) — s N (q, i) + 1, (2a) 
and obviously 

s N+1 (N + l,N + l) = 0. (2b) 
The Eq. (2a) served in the derivation of the iterative formulas (1) [10]. 
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The same numerical technique is applied also to the case of the Barabasi- 
Albert scale-free trees. The only difference is that in this case, the node q is 
selected with preference of the number of its pre-existing links. Namely, 

N 

i=i 

where k(i) is the number k of links of i-th node. Additional matrix r(i) contains 
the indices of row of the distance matrix S where "1" is encountered. Each case 
s N{i,j) = 1 indicates a link between nodes % and j. The matrix r(i) is useful 
to select nodes of given degree for the scale-free trees and graphs, according 
to the so-called Kertesz algorithm [14]. 

Further, the same technique is applied to simple graphs, where new nodes are 
attached to previously existing ones by m = 2 links. Then, cyclic paths are 
possible and the distance matrix S is to be rebuilt when adding each node. 
The algorithm is as follows: Let us suppose that (N + l)-th node is added to 
existing nodes p and q ^ p. Then 

V 1 < i,j < N : s N+1 (i, j) = 

( \ (3a) 

= min {s N (i,j),s N (i,p) + 2 + s N (q, j), s N (i, q) + 2 + s N (p, j)). 

For new, (N + l)-th, column/row 

V 1 < i < N : s N+ i(N + 1,0 = s N+1 (i,N + 1) = min (s N (p,i), s N (q,i)^ + 1, 

(3b) 

and again for the diagonal element 

s N+1 (N + l,N + l) = 0. (3c) 

One step of construction of the matrix S for simple graphs (m = 2) is presented 
in Fig. 2. An example of the construction S for trees (m = 1) is given in [10]. 



4 Results of calculations 

In Figs. 3 and 4 the dependences (a) A d (N) = d z (N) - d Y (N) and (b) 
A e (N) = e z (N) — e Y (N) obtained from growth simulations are presented, 
for exponential trees and for scale-free trees, respectively. The results of sim- 
ulations are averaged over N run = 10 5 independent growths. For both kinds 
of trees the difference in average node-node distance tends to the con- 
stant value during the growth process. For higher moments the effect is even 
stronger and A e increases with the tree size N. In Fig. 3 we give also the 
results for A d (N) and A e (iV) calculated with Eq. (1) for exponential trees of 
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Fig. 2. Construction of the distance matrix S in the case of growing graphs (m = 2). 
The gray sites show randomly chosen columns/rows (nodes to which new node will 
be attached). The black sites show matrix elements which are reevaluated from Eq. 
(3a) due to newly created shortcuts. The last columns/rows are constructed accord- 
ing Eqs. (3b) and (3c). Starting with the Y-like star new nodes were subsequently 
added to nodes (p, q) = ((3, 4), (3, 5), (4, 6), (3, 6), (1, 7)) . 

N = 10 9 nodes. The fact that A d and A e do not decrease with N means that 
growing structures preserve the memory on their initial shapes. 

In the case of simple graphs (m = 2), the distance matrix S must be reevalu- 
ated, what makes the time of the calculation substantially larger. The results 
for graphs are averaged only over one thousand of independent growths. The 
curves d(N) and e(N) for both kind of simple graphs are shown in Fig. 5. 
The linear fits for 100 < N < 1000 are d(N) = 0.672 ln(iV) + 0.299 and 
d(N) = 0.462 ln(iV) + 0.888 for the exponential graphs and the scale-free 
graphs, respectively. The functions A d (N) and A e (/V) for both kind of evolv- 
ing graphs are shown in Fig. 6. 

For the scale-free graphs, we observe some small memory effect, which mani- 
fests as a constant mutual shift of the plots e(N) vs. ln(iV). In this case it is 
not clear if the effect vanishes or not, when iV tends to infinity. 



5 Discussion 



In the case of the exponential trees, the results of the simulations agree well 
with the curves obtained from the iteration equations. This fact supports the 
reliability of the numerical equation for the scale-free trees and the graphs 
with m — 2, where we have no analytical calculations. 

Main result of this work is, that the node-node distance distribution in a 
growing tree depends on its initial structure. Our calculations indicate, that 
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(a) exponential trees (m=1), N run =10 5 (b) exponential trees (m=1), N mn =10 5 




Fig. 3. The function (a) A^(iV) and (b) A e (iV) for exponential trees obtained with 
iterative formula (1) as well as from the direct growth simulations. The results of 
simulations are averaged over N Ixm = 10 5 independent growths. 



(a) scale-free trees (m=1), N mn =10 



(b) scale-free trees (m=1 }, N run =1 




Fig. 4. The function (a) Ad(N) and (b) A e (N) for scale-free trees obtained from the 
growth simulations. The results are averaged over N vun = 10 5 independent growths. 



(a) simple graphs, N, 
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Fig. 5. The function (a) d(N) and (b) e(N) for the exponential and scale-free graphs 
and different initial configurations obtained from the growth simulations. The results 
are averaged over N imi = 10 3 independent growths. The dependence on the initial 
configuration is not visible in the scale of the plot. 

both the average distance d and its second moment e in trees display this kind 
of memory. The information is encoded in the constant C\ in the expression 
d = 21n(iV) + C\. The constant C\ varies by about 0.109 and 0.164, when we 
change the shape of the initial tree of four nodes from the Y-like star to the 
Z-like chain for the exponential and scale-free trees, respectively. In the second 
moment e = 4 In (N) + ci ln(iV) + C3, it is the constant C2 which depends on 
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Fig. 6. The function A^(A r ) and A e (N) for the exponential and the scale- free graphs 
obtained from the growth simulations. The results are averaged over N run = 10 3 
independent growths. 

the initial shape. This is true both for the exponential and the scale-free trees. 

The memory effect is much reduced or even disappears in the case when new 
nodes are linked to the network by at least two edges. In this case, the distance 
matrix S is rebuilt by new edges which can shorten distances between initially 
far nodes by providing new paths between them. 

Concluding, we have demonstrated that the growing trees carry an information 
on their initial geometrical structure. The validity of this result relies not only 
on pure geometry, but also on a particular application of the graph theory. 
As remarked in the Introduction, the list of examples of such applications is 
quite rich [3]. If the considered network is due to the citation index, we trace 
the flow of a new idea from one paper to another. We see that around some 
seminal papers, networks of citations are formed, as it happens in the case of 
Ref. [9]. Sometimes there are two or more seminal papers, and then the shape 
of the network depends on their clarity, ease of mathematical formulation and 
individual preferences of the readership, formed in personal contacts. If new 
results spread just by reading papers, it spreads slowly: somebody reads it, 
tells to a friend, the friend's student is asked to calculate a similar problem. 
The tree is 'chain- like'. On the contrary, each conference makes the tree of 
spreading of ideas to be more 'star-like', where the possible sources of getting 
new information are multiple and efficient. 
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